English-Spanish Large Statistical Dictionary of Inflectional Forms
نویسندگان
چکیده
The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.
منابع مشابه
Design and Evaluation of Inflectional Stemmer for Bulgarian
The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of the BulStem inflectional stemmer for Bulgarian are presented. The problem is addressed from a machinelearning perspective using a large morphological dictionary. A detailed automatic evaluation in terms of understemming, over-stemming and...
متن کاملUsing POS Information for Statistical Machine Translation into Morphologically Rich Languages
When translating from languages with hardly any inflectional morphology like English into morphologically rich languages, the English word forms often do not contain enough information for producing the correct fullform in the target language. We investigate methods for improving the quality of such translations by making use of part-ofspeech information and maximum entropy modeling. Results fo...
متن کاملA Comparison of Productive Vocabulary in Chinese and American Advanced English Learners’ Academic Writings
A comparison has been made of productive vocabulary in some normal university English majors’ theses in China and American final-year undergraduates’ papers. The research demonstrates that with family as the measurement unit, Chinese students proportionally use fewer words of the 1 st , 8th to 10th 1000 frequency level words than American students, while in terms of the 2nd to 4th 1000 frequenc...
متن کاملInteractive Translation of Conversational Speech
We present JANUS-II, a large scale system effort aimed at interactive spoken language translation. JANUS-II now accepts spontaneous conversational speech in a limited domain in English, German or Spanish and produces output in German, English, Spanish, Japanese and Korean. The challenges of coarticulated, disfluent, ill-formed speech are manifold, and have required advances in acoustic modeling...
متن کاملBulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian
The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of BulStem – a freely available inflectional stemmer for Bulgarian, are presented. The problem is addressed from a machine-learning perspective using a large morphological dictionary. A detailed automatic evaluation in terms of under-stemmin...
متن کامل